Goto

Collaborating Authors

 variance reduced optimization


On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Neural Information Processing Systems

The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why.


Reviews: On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Neural Information Processing Systems

I'm glad you commented on the learning rate selection, because this was a major point of our discussion. The main reason I can't increase my score is that many definitions, explanations and experiment details are missing, making it extremely hard to evaluate the real value of your experiments. This was additionally complicated by the fact that you didn't provide your code when submitting the paper. I hope that you will do a major revision, for example include a section in supplementary material with all experiments details. Just in case, here are my suggestions for some extra experiments: 1. Measure and plot the effect of data augmentation on bias of the gradient.


Reviews: On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Neural Information Processing Systems

The reviewers thought this paper provided an interesting analysis of the lack of variance reduction methods for deep nets. They however raised several concerns about the lack of details about the experimental setup, especially since such details can affect the outcome of the paper. This is true that, for such papers, there are always more experiments that the authors can run to broaden or strengthen their statements. That being said, I decided to accept the paper for the following reasons: - Requiring a large number of experiments for a paper to be accepted to NeurIPS will preclude those with limited compute resources to have papers published. Thus, accepting this paper will provide a basis for experiments for other researchers.


Local Smoothness in Variance Reduced Optimization

Neural Information Processing Systems

We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA). For a large family of penalized empirical risk minimization problems, our methods exploit data dependent local smoothness of the loss functions near the optimum, while maintaining convergence guarantees. Our bounds are the first to quantify the advantage gained from local smoothness which are significant for some problems significantly better. Empirically, we provide thorough numerical results to back up our theory. Additionally we present algorithms exploiting local smoothness in more aggressive ways, which perform even better in practice.


On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Neural Information Processing Systems

The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why.


On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Defazio, Aaron, Bottou, Leon

Neural Information Processing Systems

The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why. Papers published at the Neural Information Processing Systems Conference.


Local Smoothness in Variance Reduced Optimization

Vainsencher, Daniel, Liu, Han, Zhang, Tong

Neural Information Processing Systems

We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA). For a large family of penalized empirical risk minimization problems, our methods exploit data dependent local smoothness of the loss functions near the optimum, while maintaining convergence guarantees. Our bounds are the first to quantify the advantage gained from local smoothness which are significant for some problems significantly better. Empirically, we provide thorough numerical results to back up our theory. Additionally we present algorithms exploiting local smoothness in more aggressive ways, which perform even better in practice.